feat(keypoint-detection): add COCO OKS-AP evaluation by jeon185 · Pull Request #949 · microsoft/winml-cli

jeon185 · 2026-06-23T18:37:28Z

Adds the eval stage for keypoint-detection (ViTPose), so the COCO-keypoint models from #284 now go through config -> build -> perf -> eval. Stacked on #905 (the config/build/perf enablement) - that one should go in first.

What's here:

metrics/keypoint.py - KeypointAPMetric. Computes the COCO keypoint score (OKS-based AP over 0.50:0.95) with pycocotools COCOeval, the same way object-detection already reuses the COCO mAP protocol.
keypoint_detection_evaluator.py - top-down evaluator. transformers has no keypoint-detection pipeline, so it runs the image processor and ONNX model directly: for each ground-truth person box it does preprocess -> model -> post_process_pose_estimation and scores against the GT keypoints. ViTPose is exported with a static batch of 1, so each person crop runs separately and the heatmaps are stacked back together for post-processing. It uses the GT person boxes (standard COCO top-down protocol - keeps the score about pose accuracy, not detection).
scripts/build_coco_keypoints.py - builds a local COCO val keypoints dataset. COCO has no script-free HF mirror for person keypoints, so this downloads the annotations once and fetches images individually, which means a small subset doesn't need the full image zip.
Schema, evaluator registry, default dataset, and unit tests for the metric and evaluator.

Verified on the five COCO 17-keypoint models (vitpose-base-simple and vitpose-plus-{small,base,large,huge}): config -> build -> perf -> eval all pass and return COCO AP/AR. AP rises with model size as you'd expect. Absolute numbers are on the low side right now because the build quantizes with random calibration data, but relative comparison holds.

synthpose-vitpose-huge-hf - not covered yet

This is the one model from #284 that this PR does not evaluate. It predicts 52 anatomical keypoints instead of COCO's 17, so it can't be scored against COCO ground truth - the keypoint sets don't line up and OKS is only defined when they do.

How it's handled for now: the metric checks the keypoint count up front and raises a clear, actionable error instead of failing with a numpy broadcast error deep inside pycocotools.

Idea for finishing it: KeypointAPMetric already takes sigmas and keypoint_names as arguments, so the main missing piece is a dataset with SynthPose's 52-keypoint ground truth plus the matching OKS sigmas. I'd rather agree on the dataset and sigmas in review before adding that - happy to land it in this PR or as a follow-up, whichever you prefer.

Refs #284.

Adds the eval stage for keypoint-detection (ViTPose), completing config -> build -> perf -> eval for the COCO-keypoint models in #284. - metrics/keypoint.py: KeypointAPMetric computes the COCO keypoint score (OKS-based AP over 0.50:0.95) via pycocotools COCOeval, the same way object-detection reuses the COCO mAP protocol. - keypoint_detection_evaluator.py: top-down evaluator. transformers has no keypoint-detection pipeline, so it drives the image processor and ONNX model directly - per ground-truth person box it runs preprocess -> model -> post_process_pose_estimation and scores against GT keypoints. ViTPose exports a static batch of 1, so each person crop runs separately and the heatmaps are stacked for post-processing. Uses GT person boxes (standard COCO top-down, isolates pose accuracy from detection). - scripts/build_coco_keypoints.py: builds a local COCO val keypoints dataset; downloads annotations once and fetches images individually so a subset does not need the full image zip. - Schema, evaluator registry, default dataset, unit tests. Verified on the five COCO 17-keypoint models (vitpose-base-simple, vitpose-plus-{small,base,large,huge}): config -> build -> perf -> eval all pass and return COCO AP/AR. synthpose-vitpose-huge-hf is not covered yet. It predicts 52 anatomical keypoints rather than COCO's 17, so it can't be scored against COCO ground truth - the keypoint sets don't line up, and OKS is only defined when they do. Right now the metric detects this mismatch and raises a clear error instead of failing deep inside pycocotools. KeypointAPMetric already takes sigmas and keypoint_names as arguments, so supporting SynthPose mainly needs a dataset with its 52-keypoint ground truth plus the matching OKS sigmas; I'd rather confirm the dataset/sigmas choice in review before adding that. Open to suggestions on whether to land it here or as a follow-up. Refs #284.

jeon185 requested a review from a team as a code owner June 23, 2026 18:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(keypoint-detection): add COCO OKS-AP evaluation#949

feat(keypoint-detection): add COCO OKS-AP evaluation#949
jeon185 wants to merge 1 commit into
feat/keypoint-detection-enablementfrom
feat/keypoint-detection-eval

jeon185 commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jeon185 commented Jun 23, 2026

synthpose-vitpose-huge-hf - not covered yet

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant